ImpactMojo
Premium

Econometrics 101 Formula Reference

Essential Formulas for Causal Inference & Policy Evaluation

🎯 Quick Method Selection Guide

Method When to Use Key Assumption Estimate
OLS Selection on observables E[ε|X] = 0 ATE (if CIA holds)
IV/2SLS Endogenous treatment Exclusion restriction LATE for compliers
DiD Policy changes over time Parallel trends ATT for treated units
RDD Assignment by threshold No manipulation LATE at cutoff
Fixed Effects Panel data, unobserved heterogeneity Strict exogeneity Within estimator

📊 Basic Regression & OLS

Population Model

Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε
β₁ = marginal effect of X₁ on Y, holding other X's constant

OLS Estimator

β̂ = (X'X)⁻¹X'Y
Minimizes sum of squared residuals

Standard Errors

SE(β̂) = √diag{σ²(X'X)⁻¹}
where σ² = RSS/(n-k-1)

t-statistic

t = β̂/SE(β̂)
H₀: β = 0, reject if |t| > 1.96 (α = 0.05)
OLS Assumptions (Gauss-Markov)
  • A1: Linear in parameters
  • A2: Random sampling
  • A3: No perfect collinearity
  • A4: Zero conditional mean: E[ε|X] = 0
  • A5: Homoskedasticity: Var(ε|X) = σ²
  • A6: Normality: ε|X ~ N(0, σ²) [for inference]
Robust SEs (Heteroskedasticity)
Stata: reg y x, robust
R: lm_robust(y ~ x, data)
Clustered SEs
Stata: reg y x, cluster(id)
R: lm_robust(y ~ x, clusters = id)

🔧 Instrumental Variables (IV/2SLS)

Structural Equation

Y = β₀ + β₁X + β₂Z + ε
X is endogenous, Z are exogenous controls

First Stage

X = π₀ + π₁IV + π₂Z + v
IV predicts endogenous variable X

Reduced Form

Y = γ₀ + γ₁IV + γ₂Z + u
Total effect of IV on outcome

IV Estimator (Wald)

β̂₁ᴵⱽ = γ₁/π₁ = Cov(Y,IV)/Cov(X,IV)
Ratio of reduced form to first stage
IV Assumptions
  • Relevance: Cov(IV, X) ≠ 0 (first stage F > 10)
  • Exclusion: Cov(IV, ε) = 0 (only affects Y through X)
  • Monotonicity: Same direction response for all units
  • Independence: IV uncorrelated with unobservables
Key Diagnostic Tests
  • Weak IV Test: F-stat in first stage > 10 (Stock-Yogo critical values)
  • Overidentification: Hansen J-test (p > 0.05 for valid instruments)
  • Endogeneity Test: Hausman test (compare OLS vs IV)
Stata: ivregress 2sls y (x = iv) z, first
R: ivreg(y ~ x + z | iv + z, data)

📈 Difference-in-Differences (DiD)

DiD Estimator

δ̂ = (Ȳ₁ᵀ - Ȳ₀ᵀ) - (Ȳ₁ᶜ - Ȳ₀ᶜ)
Treatment effect = difference in differences

Regression Specification

Y_{it} = β₀ + β₁Treat_i + β₂Post_t + β₃(Treat×Post) + X_{it} + ε_{it}
β₃ is the DiD estimate

With Fixed Effects

Y_{it} = α_i + λ_t + β(Treat×Post) + X_{it} + ε_{it}
α_i = unit FE, λ_t = time FE

Event Study

Y_{it} = α_i + λ_t + Σₖ βₖD_{i,t+k} + X_{it} + ε_{it}
βₖ = effect k periods relative to treatment
DiD Assumptions
  • Parallel Trends: E[Y₁ᶜ - Y₀ᶜ | X] = E[Y₁ᵀ - Y₀ᵀ | X, D=0]
  • No Anticipation: Treatment doesn't affect pre-treatment outcomes
  • SUTVA: No spillovers between units
  • Common Shocks: Time effects same for treated/control
Validity Tests
  • Pre-Trends Test: Test β_{-2} = β_{-1} = 0 in event study
  • Placebo Test: No effect on unaffected outcomes
  • Balance Test: Pre-treatment characteristics similar
Stata: reghdfe y treat##post, absorb(id time)
R: feols(y ~ treat:post | id + time, data)

📏 Regression Discontinuity (RDD)

Sharp RDD

Y_i = α + βT_i + f(X_i - c) + ε_i
T_i = 1 if X_i ≥ c, β = treatment effect

Fuzzy RDD (First Stage)

D_i = γ + δT_i + g(X_i - c) + v_i
D_i = actual treatment, δ = compliance rate

Local Linear Estimation

Y_i = α + β·1{X_i ≥ c} + γ(X_i - c) + δ(X_i - c)·1{X_i ≥ c} + ε_i
Linear slopes on each side of cutoff

Optimal Bandwidth

h* = C_K[(2σ²)/(f(c)·m₂²)]^(1/5)·n^(-1/5)
Imbens-Kalyanaraman bandwidth
RDD Assumptions
  • No Manipulation: Smooth density of running variable at cutoff
  • Continuity: E[Y₀|X] continuous at cutoff
  • Local Randomization: As-good-as-random near cutoff
Validity Tests
  • McCrary Test: ln(f₊/f₋) normally distributed
  • Covariate Balance: No jumps in pre-treatment variables
  • Bandwidth Sensitivity: Results stable across bandwidths
  • Placebo Cutoffs: No effects at fake thresholds
Stata: rdrobust y x, c(cutoff)
R: rdrobust(y, x, c = cutoff)

📋 Panel Data & Fixed Effects

One-Way Fixed Effects

Y_{it} = α_i + βX_{it} + ε_{it}
α_i = individual-specific effect

Two-Way Fixed Effects

Y_{it} = α_i + λ_t + βX_{it} + ε_{it}
Controls individual + time effects

Within Estimator

β̂_{FE} = [Σᵢ Σₜ (X_{it} - X̄_i)(X_{it} - X̄_i)']⁻¹ Σᵢ Σₜ (X_{it} - X̄_i)(Y_{it} - Ȳ_i)
Uses within-unit variation only

Random Effects

Y_{it} = β₀ + βX_{it} + α_i + ε_{it}
α_i ~ N(0, σ²_α), uncorrelated with X
Fixed Effects Assumptions
  • Strict Exogeneity: E[ε_{it}|X_i, α_i] = 0 for all t
  • No Time-Varying Confounders: Other factors controlled by time FE
  • Sufficient Variation: X_{it} varies within units over time
Panel Data Tests
  • Hausman Test: H₀: Random effects consistent (prefer if p > 0.05)
  • F-test for FE: H₀: α_i = α for all i
  • Serial Correlation: Wooldridge test for AR(1)
Stata: xtreg y x, fe
R: feols(y ~ x | id + year, data)

📊 Standard Error Types & When to Use

SE Type When to Use Stata Command R Command
Classical Homoskedastic errors reg y x lm(y ~ x)
Robust Heteroskedasticity reg y x, robust lm_robust(y ~ x)
Clustered Within-cluster correlation reg y x, cluster(id) lm_robust(y ~ x, clusters = id)
Bootstrap Non-standard distributions bootstrap: reg y x boot package
Panel Robust Panel heteroskedasticity xtreg y x, fe robust feols(y ~ x | id, vcov = "hetero")
🎯 Critical Values & Rules of Thumb